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We study the Blackwell-Rao (BR) estimator of the probability distribution of the angular power 
spectrum, P(Ci\d), by applying it to samples of full-sky no-noise CMB maps generated via Gibbs 
sampling. We find the estimator, given a set of samples, to be very fast and also highly accurate, 
as determined by tests with simulated data. We also find that the number of samples required 
for convergence of the BR estimate rises rapidly with increasing £, at least at low £. Our existing 
sample chains are only long enough to achieve convergence at £ < 40. In comparison with P(Ce\d) as 
reported by the WMAP team we find significant differences at these low £ values. These differences 
lead to up to ~ 0.5 a shifts in the estimates of parameters in a 7-parameter ACDM model with 
non-zero dn s /dlnfc. Fixing dn s /dlnfc = makes these shifts much less significant. 

Unlike existing analytic approximations, the BR estimator can be straightforwardly extended for 
the case of power spectra from correlated fields, such as temperature and polarization. We discuss 
challenges to extending the procedure to higher I and provide some solutions. 



PACS numbers: 95.85Bh,95.75.-z,98.80.Es,98.70.Vc 

I. INTRODUCTION 

As predicted 0,013, observations of the cosmic mi- 
crowave background (CMB) anisotropies (e.g. 0,0,13) 
have provided very tight constraints on cosmological pa- 
rameters (e.g. 0,0 E|)- The standard approach to es- 
timating cosmological parameters, given a map of the 
CMB, is to first estimate the probability distribution 
of the angular power spectrum from the map or time- 
ordered data, P(Cf|d), and then use P(Ce\d) to get 
the probability distribution of the cosmological param- 
eters assuming some model. While it is possible to esti- 
mate the cosmological parameters without ever estimat- 
ing P(Cg\d), going through this intermediate step has 
several advantages. Chief among these is that one can es- 
timate parameters for many different parameter spaces, 
each time starting from the same P(Ci\d) instead of from 
an earlier point in the analysis pipeline, thereby reducing 
demands on computer resources. 

The path from P(Ci |d) to cosmological parameter con- 
straints is most often traversed by thegeneration of a 
Monte Carlo Markov chain (MCMC) [Tfl, O, The 
chain is a list of locations in the cosmological parameter 
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space which has the useful property that the probability 
that the true value is in some region of parameter space 
is proportional to the number of chain elements in that 
region of parameter space. The chain is generated using 
a Metropolis-Hastings algorithm that requires evaluation 
of P(Ci\d) at tens of thousands of randomly chosen trial 
locations. 

At low £ P(Ci\d) is significantly non-Gaussian. Non- 
Gaussian analytic forms, whose parameters can be esti- 
mated from the data, have been investigated 0, 0, 
and widely used. The validity of these analytic approxi- 
mations however is not under rigorous mathematical con- 
trol. It is established on a case-by-case basis by compar- 
ison with computationally expensive brute-force evalua- 
tions of P(Ci\d). Further, these comparisons do show 
some level of discrepancy which may be significant for 
parameter estimation. 

Here we calculate P(C/\d) with the BR estimator as 
suggested by Wandelt et al. This estimator is a sum 
over P{C(\si) where the are a chain of possible all-sky 
signal maps produced as a by-product of the Gibbs sam- 
pling procedure. The BR estimator has some appealing 
properties. First, it is exact in the limit of an infinite 
number of samples. Second, given the samples, it can be 
very rapidly calculated. 

Of course, the BR estimator is only accurate given a 
sufficient number of samples for convergence. We study 
convergence of the BR estimate from samples generated 
from first-year WMAP Q, V and W band data as de- 



2 



scribed by Eriksen et al. [l7| and O'Dwyer et al. [3. We 
find that the number of samples rises exponentially with 
increasing maximum multipole considered, £ meiK , due to 
the rising volume of the space to be explored. Beyond 
£ ~ 40 we need more samples than the 955 that we have. 

Even at £ < 30 where our BR estimate has converged, 
we find significant differences between our BR-estimated 
P(Ci |d) and that given by the WMAP team as described 
by Hinshaw et al. (hereafter H03) and Verde et al. 
(l3|. These differences are not due solely to BR though, 
but the combined effect of a number of differences in 
our analysis procedures. To investigate the significance 
of these differences we estimate cosmological parameters 
in two cases: i) using the WMAP team's description 
of P(Ce |d) and ii) using a hybrid scheme where we re- 
place the WMAP team's P(C e \d) at £ < 30 with the 
BR estimate. Assuming a zero mean curvature ACDM 
cosmology parameterized by the primordial power spec- 
trum amplitude and power-law index, reionization red- 
shift, baryon density, cold dark matter density and a 
cosmological constant, we find no significant changes to 
the parameter constraints. With this model, the data at 
£ > 30 can be used to predict the low £ behavior suffi- 
ciently well that the low I P(Ci\d) differences are unim- 
portant. However, when we allow a logarithmic scale- 
dependence to the power-law spectral index, the high £ 
data cannot predict the low I data as accurately and the 
discrepancies at low £ are important. We find that the 
evidence for a running index is weakened when using our 
improved description of the likelihood. 

That small differences in P(Ci\d) can lead to sig- 
nificant differences in parameter constraints has been 
pointed out already by Slosar et al. [2jj. They also used 
a hybrid procedure, calculating the I < 12 likelihood 
of the parameters directly from a coarsened version of 
the WMAP maps at every step in the generation of the 
Markov chain. They further used a more conservative 
treatment of the uncertainty from foreground contam- 
ination than was used in our and the WMAP team's 
own modeling. Nevertheless, Slosar et al. [2(j also find 
significantly weakened evidence for non-zero running, in 
agreement with the present analysis. 

Our current inability to use Gibbs sampling for pa- 
rameter estimation over the whole range of £ (entirely 
bypassing analytic approximations to P[Ci\A)) is unfor- 
tunate. With the inclusion of foregrounds (in particular 
point sources) and/or with data from multiple detectors, 
each with their own beam profile uncertainties, reliable 
analytic descriptions of the uncertainty in Ce at high £ 
do not exist either. In principle, sampling approaches 
can take these uncertainties into account with arbitrary 
accuracy. Below we discuss challenges to extending sam- 
pling procedures to high £. Further, we demonstrate that 
a simple modification to the BR estimator can dramati- 
cally reduce the number of independent samples required 
for convergence. 

The BR estimate, given samples of maps for tempera- 
ture and polarization as well, can easily be extended to 



estimate the joint probability distribution of the temper- 
ature and polarization auto- and cross-correlation power 
spectra. In contrast, there are no other existing methods 
for describing this probability distribution other than ex- 
pensive brute-force direct evaluation from the maps, or 
neglect of the cross-correlations in the power spectrum 
errors. Neglecting these correlations can lead to signifi- 
cant errors [2 if . 

A strong case for a hybrid estimator, similar to the 
one used in the current paper, was made by Efstathiou 
|22| . The idea was to use an approximate, but com- 
putationally cheap, pseudo-C^ method at high £, and a 
more accurate quadratic estimator at low Cs where the 
pseudo-C^ approach is significantly sub-optimal. Here we 
point out that Gibbs sampling together with the BR es- 
timator can replace the quadratic estimator for the low £ 
range. Certainly, the computational cost is significantly 
higher because of the sampling stage, and the method 
does not lend itself as easily to Monte Carlo simulations. 
But BR does have several advantages. First, the com- 
plete description of uncertainties due to monopole and 
dipole subtraction, foreground marginalization and cor- 
related noise is much more transparent in this approach. 
Second, the computational scalings of the two methods 
are very different, implying that the "low" £ regime can 
be extended to significantly higher multipoles with the 
Gibbs sampling method than with the quadratic estima- 
tor. Finally, the BR estimator accurately describes the 
significantly non-Gaussian distribution, P(Ci |d), which 
is assumed to be Gaussian in |22| • 

In section II we briefly review Gibbs sampling and the 
BR estimator. In section III we discuss convergence. In 
Section IV we compare BR with the analytic approxi- 
mations of the WMAP likelihood code in a 2-parameter 
space of amplitude and tilt, demonstrating the conver- 
gence of the chains and our discrepancies with WMAP. 
In section V we present the cosmological parameter re- 
sults. In section VI we discuss modifications to BR to 
allow extension to higher £ values. In section VII we 
conclude. 



II. GIBBS SAMPLING AND THE 
BLACKWELL-RAO ESTIMATOR 

The current paper is a natural extension of the work 
on CMB analysis through Gibbs sampling initiated by 
Jewell et al. [23| and Wandelt et al. 0, and applied 
to the first-year WMAP data by Eriksen et al. and 
O'Dwyer et al. [l^. Here we only briefly review the con- 
ceptual points behind this method, and refer the inter- 
ested reader to those papers for full details. 

In this paper we focus on the first-year WMAP data, 
in which case the observed data may be written in the 
form 

d = As + n. (1) 

Here d is a noisy sky map, s is the true sky signal, A 
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denotes beam convolution, and n is instrumental noise. 
The sky signal is assumed to be Gaussian distributed 
with zero mean and a harmonic space covariance matrix 



CeSwSr, 



The noise is also assumed to 



be Gaussian distributed, with zero mean and a pixel- 
space covariance matrix N.y = Oq / y/N \, s (i)Sij which is 
perfectly known. 



A. Elementary Gibbs Sampling 

Our goal is to establish the posterior probability dis- 
tribution P(Ce\d). Since all quantities are assumed to 
be Gaussian distributed, this can in principle be done by 
evaluating the likelihood function (and assuming a prior) . 
However, this brute-force approach involves determinant 
evaluation of a mega-pixel covariance matrix for modern 
data sets, and is therefore computationally unfeasible. 
An alternative approach was suggested by Jewell et al. 
[2^ | and Wandelt et al. 0] , namely to draw samples from 
the posterior, rather than evaluate it. 

While it is difficult to sample from P(CV |d) directly, 
it is in fact fairly straightforward to sample from the 
joint probability distribution P(Ce, s|d) using a method 
called Gibbs sampling 0, E3 : Suppose we can sam- 
ple from the conditional distributions P(Ci\s, d) and 
P(s\Ce, d). Then the theory of Gibbs sampling says that 
samples (s l , C\) can be drawn from the joint distribution 
P(Cg, s|d) by iterating the following sampling equations, 

J+i 



s 

Cl +1 



P(s\C},d), 
P(C e \s l+1 ). 



(2) 
(3) 



The symbol '<— ' indicates that a random vector is drawn 
from the distribution on the right hand side. After some 
burn-in period, the samples will converge to being drawn 
from the required joint distribution. Finally, P(Ci\d) is 
found by marginalizing over s. 

How to sample from the required conditional densi- 
ties is detailed by Jewell et al. 0, Wandelt et al. 
and Eriksen et al. These papers also describe how 

to analyze multi-frequency data, as well as how to deal 
with complicating issues such as partial sky coverage and 
monopole and dipole contributions. It is also straightfor- 
ward to include several forms of foreground marginal- 
ization within this framework, and the uncertainties in- 
troduced by any such effects are naturally expressed by 
the properties of the sample chains; no explicit post- 
processing is required. 



B. Parameter Estimation and the Blackwell-Rao 
Approximation 

By 'parameter estimation' we mean mapping out the 
posterior distribution P(8\d), where 9 is the desired set of 
parameters. This is usually done by first choosing some 
set of parameters from which a corresponding power spec- 
trum is computed by numerical codes such as CMBFast. 



Second, the distribution value for the chosen parameters 
are then found by estimating P(Ci(9) |d). This procedure 
is then either repeated over a grid in the parameters, or 
incorporated into an MCMC chain. 

Thus to estimate parameters we must be able to eval- 
uate P(Ct\d) for any model Cg. While we could compute 
the histogram of the Gibbs Ci samples and simply read 
off the appropriate values, the BR estimator suggested 
for this purpose by Wandelt et al. |l6( converges more 
rapidly. First we expand the signal sample s in terms of 
spherical harmonics, 

00 1 
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and define its realization-specific power spectrum o~i> by 
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Next we note that 

P(Q(0)|s,d) = P(Q(0)|s), 



(6) 

since the power spectrum only depends on the data 
through the signal component. Furthermore, it only de- 
pends on the signal through oi, and not its phases, and 
therefore 



P(C e (6)\s) = P(C e (9)\a e ). 
We may then write 



P(Ce|d)= fp(C e ,s\d)ds 

= J P(Q|s,d)P(s|d) ds 



P(Ct\<Tt)P{*e\d)T>a e 
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where Nq is the number of Gibbs samples in the chain. 
This is called the Blackwell-Rao (BR) estimator for the 
density P(C£|d). Its meaning is illustrated in Fig.Q] 

The expression in Eg. 1111 is very useful because, for a 
Gaussian field, 
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up to a normalization constant. Eq. ^| is straightfor- 
ward to compute analytically, and an arbitrarily exact 
representation of the posterior (with increasing ./Vg) may 
therefore be established conveniently by means of Eqs. 
[TT]and[T3| 
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FIG. 1: A one-dimensional illustration of the BR estimator. 
The thin lines indicate the P(Ce\crl) distributions, and the 
thick line shows their average. This average converges toward 
the true density P(Ce\d) as the number of samples increases. 



C. Comparison with Brute- Force Likelihood 
Evaluation 

In order to verify that the method works as expected, 
we apply it to a simulated map, and compare the results 
to a brute-force evaluation of the likelihood. Since this 
likelihood computation requires inversion of the signal 
plus noise covariance matrix, we limit ourselves to a low- 
resolution case, with properties similar to those of the 
COBE-DMR data |2(|, but with significantly lower noise. 
Specifically, we simulate a sky using the best-fit WMAP 
power law spectrum, including multipolcs between £ = 2 
and 30. We then convolve this sky with the DMR beam, 
add 0.5% of the 53 GHz DMR noise (in order to regularize 
the covariance matrix as the beam drops off), and finally 
we apply the extended DMR sky cut. 

This simulation is then analyzed both using the Gibbs 
sampling and BR machinery as described above, and by 
computing the full likelihood over a parameter grid using 
the Cholesky decomposition method of Gorski [27| • The 
model power spectrum chosen for this exercise is of the 
form 

Ci(9> »)=?(£) Cf d , (14) 

where q is an amplitude parameter, n is a spectral in- 
dex, l§ is a reference multipole, and Cf d is a fiducial 
power spectrum, which we take to be that of a flat ACDM 
model that fits the data well. The fiducial spectrum is 
chosen to be the input spectrum, and consequently, we 
should expect the likelihood of the parameters to peak 
near (q,n) = (1,0). 

The comparison between the brute-force evaluation 
and the BR approximation is not quite as straightforward 
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FIG. 2: Contours in (q, n) space of constant probability given 
the simulated data described in text, for both the BR es- 
timator (solid) and brute-force evaluation of the likelihood 
(dashed). Contours are where — 21nP(C^|d) rises by 0.1, 2.3, 
6.17, and 11.8 from its minimum value, corresponding (for 
Gaussian distributions) to the peak, and the 1, 2 and 3u con- 
fidence regions. 



as one would like. The problem lies in how to truncate 
the spherical harmonics expansion at high ts. The brute- 
force likelihood computation requires that the full signal 
component is contained in the included harmonic expan- 
sion, which means that the noise power has to be larger 
than the convolved signal power before truncation. On 
the other hand, the Gibbs sampling approach requires a 
large number of samples to converge in this low signal to 
noise regime. The simulation was therefore constructed 
as a compromise: a very small amount of noise was added 
to make the covariance matrix well-behaved at the very 
highest ts included, but not more than necessary. Still, 
small differences between the two approaches must be 
expected. 

The results from this exercise are shown in Fig. 
The contours show the lines of constant likelihood where 
-2mP(Q|d) rises by 0.1, 2.3, 6.17 and 11.8 from the 
minimum, corresponding to the peak and the 1,2, and 3cr 
regions for a Gaussian distribution. The solid lines show 
the results from the BR computation, and the dashed 
lines show the results from the exact likelihood compu- 
tation. Obviously, the agreement between the two dis- 
tributions is excellent, considering the very different ap- 
proaches taken, and the above-mentioned high-^ trunca- 
tion problem. 
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(a) Convergence ratio / = 0.06 



(b) Convergence ratio / = 0.47 



FIG. 3: Illustration of the convergence criterion denned by Eq. 1151 If / = 0, the two distributions overlap perfectly, while if 
/ = 2, then they are completely separated. The two distribution pairs shown here have / = 0.06 and / = 0.47, respectively. 



III. CONVERGENCE OF THE BR ESTIMATOR 
APPLIED TO WMAP DATA 



The ultimate goal of this paper is to apply the methods 
described above to the first-year WMAP data. In order 
to do so, we first need to determine the accuracy of the 
BR estimator given our finite number of samples. In this 
section we do so by examining how the BR estimator 
fluctuates as different subsets of the Gibbs chains are 
used. 

The Gibbs machinery was applied to the first-year 
WMAP data by O'Dwyer et al. [isj|. and the primary 
results from that analysis were a number of Cg and oi 
sample chains. These chains are available to us, and 
form the basis of the following analysis. The data we use 
here are those computed from the eight cosmologically 
interesting WMAP Q-, V- and W-bands, comprising 12 
independent chains of about 80 samples each for a total 
of 955 samples. For more details on how these sam ples 
were generated, we refer the reader to O'Dwyer et al. [l8| 
and Eriksen et al. |l7|. 

The main question we need to answer before proceed- 
ing with the actual analysis is, how well does this finite 
number of samples describe the full likelihood for a given 
range of multipoles? To answer this question, we define 
a simple test based on the (q, n) model of Eq. ^]as fol- 
lows: We construct two subsets from the 955 available 
samples, each containing N s < 955/2 samples, and map 
out the probability distribution for each subset, including 



only multipoles in the range 2 < £ < £ max . From the two 
resulting probability distributions, Pi(q,n) and P2(q,n), 
we compute the quantity 



/ 



/ \Pi(q,n) - P 2 (q,n) \ dqdn 
J Pi(q, n) dgdn 



(15) 



which measures the relative normalized difference be- 
tween the two distributions; if / = then the two dis- 
tributions overlap perfectly, and if / = 2, they are com- 
pletely separated. We then increase N s until / < 0.05 for 
the first time. Two sets of such distributions are shown 
in Fig. 13 having / = 0.06 and / = 0.47 respectively. 

Of course, the chain is likely to go in and out of con- 
vergence as N s is increased further for quite some time, 
and therefore there will be a large random contribution to 
this particular statistic. For that reason we repeat the ex- 
periment eleven times, each time scrambling the full 955 
sample chain, and define the median of the resulting N" s 
as the number of samples required for convergence [2l|. 
The process is then repeated for various values of l max . 

The results from this exercise are summarized in Fig.0] 
Two important conclusions may be drawn from the in- 
formation shown in this plot. First, the number of sam- 
ples required for convergence increases very rapidly with 
f max , possibly following an exponential. However, the fit 
is less than perfect, and the true function may possibly 
be steeper at the very smallest multipoles than the larger 
ones, which would be helpful when probing the higher l- 
range. Unfortunately, the limited number of available 
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FIG. 4: The number of samples required for convergence in 
the BR estimator of the first-year WMAP data, as defined in 
Section UTTI The dots indicate the results computed from the 
data as described in the text, while the solid line indicates 
a exponential best-fit. The dashed lines indicate the limit 
above which we cannot probe because of the limited number 
of available samples. For this calculation we chose / = 0.05. 
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FIG. 5: Constraints on q and n where Ce(q, n) = q (£/£o) n Cf d 
and Cf d is a fiducial ACDM power spectrum for £q = 8, 17, 
and 25 from left to right. Solid lines are for one half of the 
BR samples and dashed lines are for another half. Contour 
levels are as in Fig. |5] 



IV. BLACKWELL-RAO VS. WMAP P{C e \d) 



samples prohibits us from determining this function fur- 
ther. 

Second, while it is strikingly clear from Fig. that the 
existing number of samples is inadequate for probing the 
full multipole range properly, we may still conclude that 
the multipole range 2 < £ < 30 is quite stable given that 
we have 955 samples. In the analysis described in the 
next section, we therefore construct a hybrid likelihood 
consisting of the BR likelihood for £ < 30 and the analytic 
WMAP approximation at higher £'s. 

A second demonstration of the same result is shown 
in Fig. |21 where we have computed the two-parameter 
likelihood using the BR estimator, splitting the sample 
chain into two parts, for three disjoint ^-ranges (£ G 
[2, 12], [13,20] and [21,30]). Here we see that the esti- 
mator is very stable over each of the three £ ranges. 

We have also considered the question of burn-in of the 
12 independent sample chains, by repeating the analyses 
described above with reduced chains. Specifically, we 
removed the five or ten first samples from each chain, 
and repeated the analyses. Neither result changed as 
an effect of this trimming, implying that burn-in is not a 
problem for the Gibbs sampling approach at low ts when 
the estimated WMAP spectrum is used to initialize the 
Gibbs chains. This result is in good agreement with the 
results presented by Eriksen et al. 0, who showed that 
the correlation length of the Gibbs chain is virtually zero 
when the signal-to-noise is much larger than one. 



There are a number of differences between our analysis 
and the WMAP team's analysis. Here we examine the 
resulting differences in P(Ci |d) and in the next section 
on estimates of cosmological parameters. Our goal is to 
understand the significance of these low £ differences. We 
do not attempt to completely disentangle which P(Cp |d) 
differences are due to which analysis differences. 

There are at least four areas where the WMAP team's 
analysis differs from ours: 

1. They use a pseudo-C^ technique to estimate the 
most likely d; 

2. At £ < 100, in order to reduce residual foreground 
contamination they do not include Q-band data; 

3. Their pseudo-Cf estimate places zero weight on the 
auto-correlation of maps from the individual differ- 
encing assemblies; and, 

4. They use a variant of the analytic approximation 
of Bond et al. [lj] to the shapes of the likelihoods. 

A number of these differences in analysis procedures 
were discussed by H03 and Verde et al. Regarding 
item 1, one can see in H03 Fig. 12 differences at low £ 
between a maximum-likelihood analysis and pseudo-C^ 
analysis as applied to V-band data. Regarding item 2, 
in Fig. 3 of H03 one can see differences at low £ between 
inclusion and exclusion of the Q-band data. Regarding 
item 3, one can see differences at low £ in Fig. 6 of H03 
depending on whether the auto-correlations are included. 

The net result of all these effects is shown in Fig. 
and [7| In Fig. [fjl we compare the univariate likelihood 
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FIG. 6: Comparison of the BR (solid curve) and the analytic WMAP (dashed curve) univariate likelihood functions for each 
multipole up to £ = 25. The vertical lines indicates the value of the best-fit WMAP power-law model (not including a running 
spectral index). The univariate likelihood functions are computed by slicing through the multivariate likelihood, fixing all other 
multipoles at the corresponding best-fit value. Notice that all distributions shown here are strongly non-Gaussian. 



functions for all multipoles up to £ — 25, as computed 
using both the WMAP analytic approximation (dashed 
smooth line) and the BR approximation (solid smooth 
line). The BR likelihoods are computed by varying one 
single multipole at a time, keeping the other multipoles 
fixed at the best-fit power-law model value. 

There are a few clear differences between the two sets 
of distributions shown in Fig. the most prominent be- 
ing a small horizontal shift in most cases, or in other 
words, different power spectrum estimates. This was an- 
ticipated, given the differences discussed above. 

More important than these shifts are the relative 
shapes of the two distributions. Such features are most 
easily compared when the two distributions have iden- 
tical modes, which is the case for I = 2,4,9 and 14. 
In the quadrupole case we see that the BR distribution 
has a heavier tail than the WMAP distribution, while 
the opposite is true for the other three cases. On the 
other hand, we find spectacular agreement for the £ = 17 
and 18 cases. All in all, the results shown in this fig- 
ure are consistent with the idea that the Gibbs sampling 
approach is an optimal method, while the WMAP ap- 
proach is based on a pseudo-C^ method, and the latter 
is therefore expected to have slightly larger error bars. 
The only case for which this rule is obviously broken is 
the quadrupole, and thus we have reason to question the 
accuracy of this particular multipole. 

We also note that a similar analysis was carried out 
by Slosar et al. [2(j. One of their major results was a 
significantly broader likelihood than the WMAP likeli- 
hood (as well as a strong shift toward larger amplitudes) 
for t < 10. The main difference between that analy- 
sis and the present is that they marginalized over three 



foreground templates, and, given the results shown in this 
section, this additional degree of freedom is most likely 
the cause of the broadened likelihood, rather than inher- 
ently under-estimated errors in the WMAP likelihood 
code. Slosar et al. 20] also found a coherent shift toward 
larger amplitudes. We see this ourselves to a lesser de- 
gree in Fig. El Seven out of the eleven Cg in the £ = 2 to 
12 range show some amount of shift to higher I. 



To further study the differences in P(Ci\d) between 
the BR approximation and the analytical approximation 
used by the WMAP team, we once again adopt the two- 
parameter non-physical model defined in Equation 1141 
Ce(q,n) = q(£/£o) n Cf d , and map out in Fig. [7| the two 
likelihoods in q and n using the two approximations. We 
display these likelihoods over the same £ ranges as in 
Fig. [SJ We can see in the left panel (the £ — 2 to 12 
range) clear evidence for a shift to higher power hinted 
at by the individual multipole distributions in Fig-EI The 
peak shifts by ~ 3/4<r and the BR contours are slightly 
tighter than the WMAP ones. Discrepancies are smaller 
in the t = 13 to 20 range and smaller still in the 21 to 
30 range, especially near n = 0. Note that the likelihood 
a t \ n \ <L 1 is irrelevant for physical models since their 
spectral shapes do not deviate that strongly from that of 
the fiducial. 



To summarize this section, we have seen that the BR 
estimate and that of WMAP for P(C e \d) do differ slightly 
at low Ps. This should result in differences in parameter 
estimates, to which we now turn. 
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FIG. 7: Constraints on q and n where C t (q, n) = q (£/l ) n Cf d 
and Cf d is a fiducial ACDM power spectrum for £ = 2 to 
I = 12 with £ = 8 (left panel), £ = 13 to I = 20 and £ = 17 
(center panel) and I = 21 to £ = 30 with = 25 (right 
panel). Solid lines are for BR and dashed lines are for the 
WMAP likelihood code. Contour levels are as in Figure [5] 



EFFECT ON COSMOLOGICAL 
PARAMETERS 



We now explore how significant these low £ differences 
are for estimates of cosmological parameters. We con- 
sider two different cosmological model parameter spaces. 
The first is a flat ACDM cosmology with a power- 
law primordial power spectrum. The second parame- 
ter space allows for a logarithmic scale-dependence to 
the power-law spectral index so that n s (k) = n s (ko) + 
dn s /dlnfcln(fc/£;o). The dn s /dlnfc parameter is com- 
monly referred to as the 'running of the spectral index', a 
reference to the analogous dependence of gauge coupling 
strength with energy scale in quantum field theories. 

We explore the parameter spaces via the MCMC mech- 
anism as described by, e.g., Christensen et al. For 
the 6-parameter cosmological models we use u>b, Wd, ^A, 
Zrci, A, n s (baryon density, cold dark matter density, dark 
energy density, redshift of reionization, amplitude of the 
primordial power spectrum at k = 0.05 Mpc -1 and the 
scalar index; the total matter density is uj m = uih + Wd) 
and a calibration parameter for each of CBI [23 and 
ACBAR 9, 29] . For the 7-parameter cosmological model 
we use the the same six parameters plus dn s /dlnfc. We 
evaluate the likelihood given the WMAP data with the 
subroutine available at the LAMBDAjs^ data archive. 
For CBI and ACBAR we use the offset log-normal ap- 
proximation of the likelihood 0]. The likelihood given 
all these data together (referred to as the WMAPext 
dataset by Spergel et al. Q) is given by the product of 
the individual likelihoods. For the the hybrid schemes, 
we replace the the WMAP likelihood calculation for the 
temperature power spectrum in the range 2 < i < 30 



with the BR estimator. In all cases, we employ a prior 
that is zero except for models with 0.40 < h < 0.95, 
t < 0.30, and 6.0 < z ro i |3(j in which case the prior is 
unity. All chains have 100,000 samples. 

The results for the 6-parameter case using the WMAP 
likelihood code (column 2 of the table) reproduce those 
reported by Spergel et al. 0. We see that the hybrid 
scheme leads to almost no differences, with any shifts in 
most likely values smaller than l/3<7. Thus there is only 
a very weak dependence on the differences in P(Ce\d) at 
low £. The reason for this is that with the 6-parameter 
model the data at high I tightly constrain the range of 
Cg values at low I. 

Now we turn to the difference between columns 4 and 5, 
where the only difference in their derivation is the treat- 
ment of the temperature power spectrum at I < 30. With 
the extra freedom in the 7-parameter model, the high £ 
data can no longer be extrapolated to low £ with as much 
confidence. The data at low £ are thus more informa- 
tive than in the six-parameter case and the differences 
at low £ become more important. Four parameters show 
shifts greater than l/3er: n s , wt>, w m and dn s /dlnfc. The 
biggest shift is in dn s /dlnfc. It reduces a 2.5<r detection 
to a 2a detection. 

We checked to make sure these shifts are significant, 
given our limited number of chain elements. To do so, 
we looked at 4 subsamples of the 7-parameter case chains, 
each with 25,000 samples, to examine fluctuations in the 
subsample mean values of each parameter. We found 
these subsample means to deviate from the total sample 
means with an rms of ~ 0.2ct. We thus estimate the 
sample variance error in our sample means to be ~ O.ler. 
We also ran a chain with 100,000 samples with the switch 
at I — 20, and found it to be consistent with the hybrid 
chain with the switch at £ = 30. 

The direction of the changes is consistent with Fig. [7| 
We see our own analysis has a higher level of power 
and lower level of tilt in the £ = 2 — 12 range and 
is more restrictive of upward power fluctuations in the 
£ = 13 — 20 range compared to the WMAP team's anal- 
ysis. Thus we want the model power spectra to be more 
negatively sloped at low £. This is accomplished by the 
0.026 increase in the running which reduces n s (k) at 
k = 0.009 Mpc' 1 (which projects to £ = 12) by 0.11 
from its value at kg = 0.05 Mpc -1 . 

It should be noted that the parameter values are 
strongly dependent on the high t cut. In fact we have 
found that most of the probability is at r > 0.3, as 
has been noticed for WMAP + VSA H HJ and for 
WMAP+CTSI |6j. At these high r values, the running 
tends to be negative also. Having high r and a negative 
running though is a priori unlikely in hierarchical mod- 
els of structure formation, and is also disfavored when 
large-scale structure data is included [2(| ■ 
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Parameter 



dn s /dlnfc fixed to zero dn s /dlnfc free to vary 

WMAP P(C e \d)\ hybrid | difference/a | WMAP P(Ci\d)\ hybrid | difference/a 



n s 


0.97 ±0.03 


0.97 ±0.03 


0.0 


0.880 ± 0.048 


0.903 ± 0.047 


0.4 


r 


n i ao+0.097 
U.!OZ_g 04g 


n 1 4n+ - 080 

U.1W_ 053 


0.1 


0.202 ± 0.065 


0.208 ± 0.063 


0.1 


A 


0.80 ±0.10 


0.79 ±0.10 


0.1 


0.91 ±0.11 


0.90 ±0.11 


0.1 


LU b 


0.023 ± 0.001 


0.023 ± 0.001 


0.0 


0.0215 ±0.0013 


0.0219 ±0.0012 


0.3 




0.136 ±0.014 


0.132 ±0.013 


0.3 


0.140 ± .015 


0.134 ± 0.014 


0.4 


h 


0.72 ± 0.05 


0.73 ± 0.05 


0.2 


0.682 ± 0.054 


0.708 ± 0.054 


0.5 


dris/d in k 








-0.079 ± 0.031 


-0.063 ±0.031 


0.5 



TABLE I: Cosmological parameter means and standard deviations derived from the WMAPext dataset using the WMAP 
likelihood code (columns 2 and 5) and using our hybrid approach where the WMAP likelihood code for the tempearture 
angular power spectrum is replaced at £ < 30 with our BR estimate of P(Ci |d). The columns labeled 'difference/cr' give the 
difference in the parameter means divided by the standard deviation of the hybrid method. Note that the finite number of 
chain samples gives rise to an uncertainty in each mean of ~ O.Ict. 



VI. EXTENDING BR TO HIGHER tS 

Wc face two challenges to extending the BR estimator 
to higher £ values. The first is that the greater the range 
of I values, the greater the volume of parameter space to 
be explored (in units of the width of the posterior in each 
direction) and therefore the larger the number of samples 
required. The second is that as the signal-to-noise ratio 
drops below unity, the correlation length of the Gibbs 
samples, produced b y th e algorithms of Wandelt et al. 
[l6| and Jewell et al. |23j. starts to get very long thereby 
reducing the effective number of independent samples. 
We do not address this second problem here, which be- 
comes important around £ ~ 350, except to say that we 
are currently implementing potential solutions. 

We see evidence of the first problem in Fig. Here 
we discuss two solutions, both of which rely on the low 
level of dependencies between the errors in Cg at differ- 
ent I values. For the first solution, we replace the BR 
estimate with a 'band BR' estimate where the averaging 
over samples is done in discrete bands of I that are then 
multiplied together. Specifically, 

P({C t }\d) = U B (P(C l<(B) ,C l<{B)+1 ...,C l>(B) 

\ a l < {B)^ l l < {B) + l--^ (T l > {B))) ( 16 ) 

where (...) indicates averaging over samples and the lower 
and upper I values of each band B are denoted by £< (B) 
and t>{B). 

The advantage of the band BR estimator is that it 
reduces the volume of the space to be explored from one 
with £ max — £ min + 1 dimensions to a product over spaces 
with number of dimensions equal to the width of the 
bands, greatly reducing the volume in units of the extent 
of the posterior. The approximation here ignores inter- 
band dependencies. Tests though have shown these to be 
negligibly small for bands of width 12. 

To demonstrate the reduction in the number of samples 
necessary for convergence, we re-do Fig.0] In Fig.0J^ min 




20 40 60 80 100 

Smallest multipole included, / . 

1 min 



FIG. 8: The number of samples required for convergence for 
the band BR estimator of the first-year WMAP data, as de- 
fined in Section lTTTl for bands extending from l m m to £ m i n ±ll. 
The dots indicate the results computed from the data. 



was fixed to 2 as £ max increased. Here as £ max increases 
so does £ min so that £ max — £ m j n ±1 = 12. We see in 
Fig. [H] that switching to the band BR estimator flattens 
out the trend of necessary number of samples with £ max . 

It may be possible to exploit the near-independence of 
different I values further. We can use BR (or even a fit 
to the histogram of Cg values in the chain) to estimate 
univariate marginalized distributions, multiply these to- 
gether as if they were independent, and then correct 
for the correlations with an analytic correction factor. 
Namely, 



lnP({Ca|d) = £ln(P(Q|d) ) 



E 



2Cgg 



^dCiFwSCi, /2 (17) 



where SCg = Cg — (Ci), Fw is the Cg Fisher matrix 
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and Cw is its inverse. These matrices can be computed 
as in H03. Note that the above expression is exact for a 
Gaussian distribution, with the first term in the log of the 
correction factor simply canceling out the sum of the logs 
of the marginalized one-dimensional distributions. Such 
a procedure will only require a handful of independent 
samples. Further, one could combine our two solutions 
here by using band BR with an analytic correction for 
the neglected inter-band dependencies. 

Certainly this use of analytics could be extended fur- 
ther to reduce the demand for number of independent 
samples. We expect that an adequate analytic form can 
be found for the posterior. One would then use the BR 
estimator, or the Cg samples, to fit the parameters of this 
analytic form. Such an approach could greatly reduce the 
demand for the number of independent samples. Essen- 
tially, we would be exploiting the fact that P({Cg }|d) is 
a very smooth distribution with a lot of regularity, such 
as the structure of inter-^ correlations and shapes of uni- 
variate distributions. Such an approach will probably be 
necessary in the high I regime where larger correlation 
lengths (at least for current sampling techniques) greatly 
reduce the number of independent samples. 

In the low signal-to-noise regime the number of inde- 
pendent samples required, even to explore the posterior 
for a single i value, increases because the width of the 
BR estimator from an individual sample is much smaller 
than the width of the posterior (since the former is for a 
noiseless sky). This problem can be mitigated by artifi- 
cially broadening the BR kernel. Specifically, we would 
set 



In (P(C,M) = V± 



-— + In [ — 

Cg \Cg 



and ng = (21 + 1) (1 + aNg/Cg)~ 



(18) 



where Ng is the noise contribution to the power spec- 
trum of the map. Setting a > broadens the kernel 
for each sample from cx Cg to oc (Cg + aNg). Unfortu- 
nately it also broadens the posterior from cx (Cg + Ng) 
to cx (Cg + (1 + a)Ng). Thus one must choose a small 
enough so the posterior is not overly broadened. At 
high Cg/Ng this broadening makes no difference. At low 
Cg/Ng the sample kernel is broadened by a large factor 
(1 + aNg/Cg) while the posterior is broadened only by 
1+a. Thus one can broaden the sample kernel in the low 
signal-to-noise regime (exactly where we want to broaden 
it) by a very large amount, without significantly broad- 
ening the posterior. The number of independent samples 
required for convergence will drop by this same factor. 
Finally, we mention one more way to reduce the di- 



mensionality of the space to be explored, and thus the 
number of samples required. And that is to replace the 
Cg's with band powers. In the low signal-to-noise regime 
such a replacement need not lead to significant loss of 
information, assuming models with smooth Cg's. 
VII. CONCLUSIONS 

We have found BR to be a useful step in the process 
of converting CMB anisotropy data, and a model of it, 
into estimates of P(Cg |d). We have shown that precise 
characterization of this distribution at low t is a key step 
in the estimation of cosmological parameters. The differ- 
ences between P(Cg |d) as computed by us with a hybrid 
approach that uses BR at £ < 30 and as computed by 
the WMAP team can lead to important differences in 
estimates of cosmological parameters. 

The BR estimator converges rapidly at low £, but re- 
quires many independent samples at high I. By exploit- 
ing the weak inter-^ dependence in P(Cg\d) we were able 
to modify the BR estimator to greatly improve conver- 
gence without significantly sacrificing accuracy. Exten- 
sions that will allow its use with correlated data, such 
as temperature and polarization, or weak lensing shear 
from multiple redshift bins, and to higher I are worth 
pursuing. 
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